Puebla
Where is my Glass Slipper? AI, Poetry and Art
This literature review interrogates the intersections between artificial intelligence, poetry, and art, offering a comprehensive exploration of both historical evolution and current debates in digital creative practices. It traces the development of computer-generated poetry from early template-based systems to generative models, critically assessing evaluative frameworks such as adaptations of the Turing Test, the FACE model, and ProFTAP. It also examines how these frameworks endeavour to measure creativity, semantic coherence, and cultural relevance in AI-generated texts, whilst highlighting the persistent challenges in replicating the nuance of human poetic expression. The review contributes a Marketing Theory discussion that deconstructs the figurative marketing narratives employed by AI companies, which utilise sanitised language and anthropomorphic metaphors to humanise their technologies. This discussion reveals the reductive nature of such narratives and underscores the tension between algorithmic precision and the realities of human creativity.The review also incorporates an auto-ethnographic account that offers a self-reflexive commentary on its own composition. By acknowledging the use of AI in crafting this review, the auto-ethnographic account destabilises conventional notions of authorship and objectivity, resonating with deconstruction and challenging logocentric assumptions in academic discourse. Ultimately, the review calls for a re-evaluation of creative processes that recognises the interdependence of technological innovation and human subjectivity. It advocates for interdisciplinary dialogue addressing ethical, cultural, and philosophical concerns, while reimagining the boundaries of artistic production.
Educating a Responsible AI Workforce: Piloting a Curricular Module on AI Policy in a Graduate Machine Learning Course
Weichert, James, Eldardiry, Hoda
As artificial intelligence (AI) technologies begin to permeate diverse fields--from healthcare to education--consumers, researchers and policymakers are increasingly raising concerns about whether and how AI is regulated. It is therefore reasonable to anticipate that alignment with principles of'ethical' or'responsible' AI, as well as compliance with law and policy, will form an increasingly important part of AI development. Yet, for the most part, the conventional computer science curriculum is ill-equipped to prepare students for these challenges. To this end, we seek to explore how new educational content related to AI ethics and AI policy can be integrated into both ethics-and technical-focused courses. This paper describes a two-lecture AI policy module that was piloted in a graduate-level introductory machine learning course in 2024. The module, which includes an in-class active learning game, is evaluated using data from student surveys before and after the lectures, and pedagogical motivations and considerations are discussed. We find that the module is successful in engaging otherwise technically-oriented students on the topic of AI policy, increasing student awareness of the social impacts of a variety of AI technologies and developing student interest in the field of AI regulation. Introduction The explosive growth of artificial intelligence (AI) technologies is widely documented and increasingly evident in everyday life: some responses from the search engine Google now include an "AI Overview" inserted before the first webpage link; companies like Tesla and Waymo have seen success in implementing partial or full autonomous driving in vehicles on live roads; and "Apple Intelligence" was the flagship feature for the launch of Apple's new smartphone in fall 2024. Yet what legal or policy response this technological growth will precipitate is less certain [1, 2]. Nevertheless, it should be expected that the development and enactment of regulatory frameworks for AI will demand AI engineers with a command not only of the technical intricacies of AI models, but also of the policy and regulatory landscape for AI development and compliance [3].
Document-Level Sentiment Analysis of Urdu Text Using Deep Learning Techniques
Document level Urdu Sentiment Analysis (SA) is a challenging Natural Language Processing (NLP) task as it deals with large documents in a resource-poor language. In large documents, there are ample amounts of words that exhibit different viewpoints. Deep learning (DL) models comprise of complex neural network architectures that have the ability to learn diverse features of the data to classify various sentiments. Besides audio, image and video classification; DL algorithms are now extensively used in text-based classification problems. To explore the powerful DL techniques for Urdu SA, we have applied five different DL architectures namely, Bidirectional Long Short Term Memory (BiLSTM), Convolutional Neural Network (CNN), Convolutional Neural Network with Bidirectional Long Short Term Memory (CNN-BiLSTM), Bidirectional Encoder Representation from Transformer (BERT). In this paper, we have proposed a DL hybrid model that integrates BiLSTM with Single Layer Multi Filter Convolutional Neural Network (BiLSTM-SLMFCNN). The proposed and baseline techniques are applied on Urdu Customer Support data set and IMDB Urdu movie review data set by using pretrained Urdu word embeddings that are suitable for (SA) at the document level. Results of these techniques are evaluated and our proposed model outperforms all other DL techniques for Urdu SA. BiLSTM-SLMFCNN outperformed the baseline DL models and achieved 83{\%}, 79{\%}, 83{\%} and 94{\%} accuracy on small, medium and large sized IMDB Urdu movie review data set and Urdu Customer Support data set respectively.
The Kinetics Observer: A Tightly Coupled Estimator for Legged Robots
Demont, Arnaud, Benallegue, Mehdi, Benallegue, Abdelaziz, Gergondet, Pierre, Dallard, Antonin, Cisneros, Rafael, Murooka, Masaki, Kanehiro, Fumio
In this paper, we propose the "Kinetics Observer", a novel estimator addressing the challenge of state estimation for legged robots using proprioceptive sensors (encoders, IMU and force/torque sensors). Based on a Multiplicative Extended Kalman Filter, the Kinetics Observer allows the real-time simultaneous estimation of contact and perturbation forces, and of the robot's kinematics, which are accurate enough to perform proprioceptive odometry. Thanks to a visco-elastic model of the contacts linking their kinematics to the ones of the centroid of the robot, the Kinetics Observer ensures a tight coupling between the whole-body kinematics and dynamics of the robot. This coupling entails a redundancy of the measurements that enhances the robustness and the accuracy of the estimation. This estimator was tested on two humanoid robots performing long distance walking on even terrain and non-coplanar multi-contact locomotion.
Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model
Baharlouei, Elaheh, Shafaei, Mahsa, Zhang, Yigeng, Escalante, Hugo Jair, Solorio, Thamar
We address the challenge of detecting questionable content in online media, specifically the subcategory of comic mischief. This type of content combines elements such as violence, adult content, or sarcasm with humor, making it difficult to detect. Employing a multimodal approach is vital to capture the subtle details inherent in comic mischief content. To tackle this problem, we propose a novel end-to-end multimodal system for the task of comic mischief detection. As part of this contribution, we release a novel dataset for the targeted task consisting of three modalities: video, text (video captions and subtitles), and audio. We also design a HIerarchical Cross-attention model with CAPtions (HICCAP) to capture the intricate relationships among these modalities. The results show that the proposed approach makes a significant improvement over robust baselines and state-of-the-art models for comic mischief detection and its type classification. This emphasizes the potential of our system to empower users, to make informed decisions about the online content they choose to see. In addition, we conduct experiments on the UCF101, HMDB51, and XD-Violence datasets, comparing our model against other state-of-the-art approaches showcasing the outstanding performance of our proposed model in various scenarios.
ATSumm: Auxiliary information enhanced approach for abstractive disaster Tweet Summarization with sparse training data
Garg, Piyush Kumar, Chakraborty, Roshni, Dandapat, Sourav Kumar
The abundance of situational information on Twitter poses a challenge for users to manually discern vital and relevant information during disasters. A concise and human-interpretable overview of this information helps decision-makers in implementing efficient and quick disaster response. Existing abstractive summarization approaches can be categorized as sentence-based or key-phrase-based approaches. This paper focuses on sentence-based approach, which is typically implemented as a dual-phase procedure in literature. The initial phase, known as the extractive phase, involves identifying the most relevant tweets. The subsequent phase, referred to as the abstractive phase, entails generating a more human-interpretable summary. In this study, we adopt the methodology from prior research for the extractive phase. For the abstractive phase of summarization, most existing approaches employ deep learning-based frameworks, which can either be pre-trained or require training from scratch. However, to achieve the appropriate level of performance, it is imperative to have substantial training data for both methods, which is not readily available. This work presents an Abstractive Tweet Summarizer (ATSumm) that effectively addresses the issue of data sparsity by using auxiliary information. We introduced the Auxiliary Pointer Generator Network (AuxPGN) model, which utilizes a unique attention mechanism called Key-phrase attention. This attention mechanism incorporates auxiliary information in the form of key-phrases and their corresponding importance scores from the input tweets. We evaluate the proposed approach by comparing it with 10 state-of-the-art approaches across 13 disaster datasets. The evaluation results indicate that ATSumm achieves superior performance compared to state-of-the-art approaches, with improvement of 4-80% in ROUGE-N F1-score.
Bayesian and Convolutional Networks for Hierarchical Morphological Classification of Galaxies
Serrano-Pรฉrez, Jonathan, Hernรกndez, Raquel Dรญaz, Sucar, L. Enrique
This work is focused on the morphological classification of galaxies following the Hubble sequence in which the different classes are arranged in a hierarchy. The proposed method, BCNN, is composed of two main modules. First, a convolutional neural network (CNN) is trained with images of the different classes of galaxies (image augmentation is carried out to balance some classes); the CNN outputs the probability for each class of the hierarchy, and its outputs/predictions feed the second module. The second module consists of a Bayesian network that represents the hierarchy and helps to improve the prediction accuracy by combining the predictions of the first phase while maintaining the hierarchical constraint (in a hierarchy, an instance associated with a node must be associated to all its ancestors), through probabilistic inference over the Bayesian network so that a consistent prediction is obtained. Different images from the Hubble telescope have been collected and labeled by experts, which are used to perform the experiments. The results show that BCNN performed better than several CNNs in multiple evaluation measures, reaching the next scores: 67% in exact match, 78% in accuracy, and 83% in hierarchical F-measure.
Semi-Supervised Hierarchical Multi-Label Classifier Based on Local Information
Serrano-Pรฉrez, Jonathan, Sucar, L. Enrique
Scarcity of labeled data is a common problem in supervised classification, since hand-labeling can be time consuming, expensive or hard to label; on the other hand, large amounts of unlabeled information can be found. The problem of scarcity of labeled data is even more notorious in hierarchical classification, because the data of a node is split among its children, which results in few instances associated to the deepest nodes of the hierarchy. In this work it is proposed the semi-supervised hierarchical multi-label classifier based on local information (SSHMC-BLI) which can be trained with labeled and unlabeled data to perform hierarchical classification tasks. The method can be applied to any type of hierarchical problem, here we focus on the most difficult case: hierarchies of DAG type, where the instances can be associated to multiple paths of labels which can finish in an internal node. SSHMC-BLI builds pseudo-labels for each unlabeled instance from the paths of labels of its labeled neighbors, while it considers whether the unlabeled instance is similar to its neighbors. Experiments on 12 challenging datasets from functional genomics show that making use of unlabeled along with labeled data can help to improve the performance of a supervised hierarchical classifier trained only on labeled data, even with statistical significance.
Towards Dog Bark Decoding: Leveraging Human Speech Processing for Automated Bark Classification
Abzaliev, Artem, Espinosa, Humberto Pรฉrez, Mihalcea, Rada
Similar to humans, animals make extensive use of verbal and non-verbal forms of communication, including a large range of audio signals. In this paper, we address dog vocalizations and explore the use of self-supervised speech representation models pre-trained on human speech to address dog bark classification tasks that find parallels in human-centered tasks in speech recognition. We specifically address four tasks: dog recognition, breed identification, gender classification, and context grounding. We show that using speech embedding representations significantly improves over simpler classification baselines. Further, we also find that models pre-trained on large human speech acoustics can provide additional performance boosts on several tasks.
Knowledge Transfer for Cross-Domain Reinforcement Learning: A Systematic Review
Serrano, Sergio A., Martinez-Carranza, Jose, Sucar, L. Enrique
Reinforcement Learning (RL) provides a framework in which agents can be trained, via trial and error, to solve complex decision-making problems. Learning with little supervision causes RL methods to require large amounts of data, which renders them too expensive for many applications (e.g. robotics). By reusing knowledge from a different task, knowledge transfer methods present an alternative to reduce the training time in RL. Given how severe data scarcity can be, there has been a growing interest for methods capable of transferring knowledge across different domains (i.e. problems with different representation) due to the flexibility they offer. This review presents a unifying analysis of methods focused on transferring knowledge across different domains. Through a taxonomy based on a transfer-approach categorization, and a characterization of works based on their data-assumption requirements, the objectives of this article are to 1) provide a comprehensive and systematic revision of knowledge transfer methods for the cross-domain RL setting, 2) categorize and characterize these methods to provide an analysis based on relevant features such as their transfer approach and data requirements, and 3) discuss the main challenges regarding cross-domain knowledge transfer, as well as ideas of future directions worth exploring to address these problems.